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Abstract 

Background: To understand transcriptional regulatory networks (TRNs), especially the coordinated dynamic 
regulation between transcription factors (TFs) and their corresponding target genes during development, 
computational approaches would represent significant advances in the genome-wide expression analysis. The 
major challenges for the experiments include monitoring the time-specific TFs' activities and identifying the 
dynamic regulatory relationships between TFs and their target genes, both of which are currently not yet available 
at the large scale. However, various methods have been proposed to computationally estimate those activities and 
regulations. During the past decade, significant progresses have been made towards understanding pollen 
development at each development stage under the molecular level, yet the regulatory mechanisms that control 
the dynamic pollen development processes remain largely unknown. Here, we adopt Networks Component 
Analysis (NCA) to identify TF activities over time couse, and infer their regulatory relationships based on the 
coexpression of TFs and their target genes during pollen development. 

Results: We carried out meta-analysis by integrating several sets of gene expression data related to Arabidopsis 
tholiono pollen development (stages range from UNM, BCP, TCP, HP to 0.5 hr pollen tube and 4 hr pollen tube). 
We constructed a regulatory network, including 19 TFs, 101 target genes and 319 regulatory interactions. The 
computationally estimated TF activities were well correlated to their coordinated genes' expressions during the 
development process. We clustered the expression of their target genes in the context of regulatory influences, 
and inferred new regulatory relationships between those TFs and their target genes, such as transcription factor 
WRKY34, which was identified that specifically expressed in pollen, and regulated several new target genes. Our 
finding facilitates the interpretation of the expression patterns with more biological relevancy, since the clusters 
corresponding to the activity of specific TF or the combination of TFs suggest the coordinated regulation of TFs to 
their target genes. 

Conclusions: Through integrating different resources, we constructed a dynamic regulatory network of Arabidopsis 
tholiono during pollen development with gene coexpression and NCA. The network illustrated the relationships 
between the TFs' activities and their target genes' expression, as well as the interactions between TFs, which 
provide new insight into the molecular mechanisms that control the pollen development. 
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Background 

Genome specifies the gene expression programs that 
control cells' differentiation through transcriptional reg- 
ulatory networks, which are characterized as the 
dynamic interactions between transcription factors and 
their target genes during development. Transcription 
factors regulate the expression of their target genes at 
transcriptional level with spatiotemporal specificity, thus 
the modification of transcription factor activity can dra- 
matically alter the gene expression profile. The primary 
challenge to understand the transcriptional regulation 
network is to measure the activities of the transcription 
factors at genome-scale, which are not yet practicable. 
However, computational methods have recently been 
developed to infer the transcription factor activities and 
the regulatory relationships between TFs and their tar- 
get-genes. 

Recent development of high-throughput technologies 
has made it possible to measure the expression activities 
of transcription factors and their target genes at the 
genome-scale. Microarrays can detect the expression 
levels of thousands of genes simultaneously [1]. But 
identifying transcription factor activities at such scale is 
still a challenge, especially for plants. Several technolo- 
gies for assessing transcriptional activities, such as ChlP- 
chip, flow cytometer, have their inherent limitation on 
genome-scale [2-4] and merely detect the activities at 
specific time point. In order to utilize the genome 
expression profile and compensate the inability to assay 
transcription factor activity on the genome-scale, many 
computational tools have been developed to accomplish 
this task through inferring gene regulatory networks 
[5-8]. One of these approaches, Network Component 
Analysis (NCA) is to determine both activities and regu- 
latory influences for a set of transcription factors on 
known target genes [9]. It has been successfully applied 
in several species and in various research perspectives, 
including yeast cell cycle [9] and cytokinesis-related 
gene regulation [10], time course of E. coli protein [11], 
knockout analysis in mouse [12], and transcriptional 
regulatory network of human [13]. 

In flowering plants, the male gametophyte (or pollen 
grain) plays a vital role in plant fertility through genera- 
tion and delivery of the male gametes to the embryo sac 
for double fertilization. The male gametophyte develop- 
ment is a complex process that requires the coordinated 
participation of various cells and tissue types, and their 
associated specific gene expression patterns. The avail- 
ability of the genome sequence of Arabidopsis (The Ara- 
bidopsis Genome Initiative, 2000) and the concomitant 
accumulation in available transcriptional profile data 
(TAIR) make Arabidopsis a preferable model plant for 
large scale genetic studies of pollen development. In 



previous studies, several sets of gene expression profiles 
for Arabidopsis pollen development time series have 
been generated [14-18]. These data cover almost all the 
stages of Arabidopsis pollen development: from unin- 
ucleate microspores, bicellular pollen, tricellular pollen, 
mature pollen grain, the 0.5 hr pollen tube, to 4 hr pol- 
len tube. Besides the availability of those gene expres- 
sion profile data, the researches on the TFs in 
Arabidopsis become increasing intensive, and a number 
of new transcription factors has been identified, either 
experimentally confirmed or computationally predicted. 
The total transcription factors of A. thaliana are pro- 
posed to be more than 2000 according to the four 
representative databases of Arabidopsis transcription 
factors: RARTF [19], AGRIS [20], DATF [21], PlnTFDB 
[22]. Among them, a few families of transcription factors 
have been intensively examined for their functionalities 
in development. However, the data for regulatory rela- 
tionships between these transcription factors and their 
confirmed target genes are very limited. 

During the past decade, major advances in genetic and 
genomic technologies have facilitated our understanding 
of pollen development at the molecular level. The achieve- 
ment includes the highly annotated A. thaliana genome, 
comprehensive A. thaliana transcriptomic datasets, and 
various gametophytic mutants. Although significant pro- 
gress has been made towards understanding pollen devel- 
opment at each development stage, yet the dynamic 
regulatory network remains further characterized, the 
transcription factors and their target genes involved in the 
dynamic processes need investigation in deeper. 

By taking advantage of NCA, we explored the regula- 
tory relationships between those TFs and their target 
genes specifically involved in the A. thaliana pollen 
development process. We identified new regulatory rela- 
tionships with our most comprehensive dynamic regula- 
tory networks, which provide new information to 
uncover the underlying mechanisms for the pollen 
development. 

Results and discussion 

When predicting interactions between TFs and their tar- 
get genes based on gene expression profile, a key 
assumption is that mRNA expression level is informative 
in the prediction of protein activity. Although expres- 
sion levels between mRNAs and their corresponding 
proteins in different cell types exhibit a range of correla- 
tions for different genes [23], an overall positive correla- 
tion between mRNA and protein expression levels has 
been identified [24,25], therefore, we adopt this strategy 
in our study. 

The NCA requires two inputs: a time series of gene 
expression profiles and a pre-defined regulatory 
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network. The original gene expression data are obtained 
from the Arabidopsis Information Resource (TAIR) and 
Gene Expression Omnibus (GEO) of NCBL They cover 
seven A. thaliana pollen developmental stages with 23 
profiles in total for wild type Columbia (Col-0): unin- 
ucleate microspores (UM), bicellular pollen (BP), tricel- 
hilar pollen (TP), mature pollen (MP), hydrated pollen 
grains (HP), 0.5 hours germinated pollen tubes (0.5 hr), 
and 4 hours germinated pollen tubes (4 hr). Those data- 
sets of pollen developmental stages were generated by 
three labs [14-16], each of which includes at least one 
MP sample as control. In order to make comparison 
between datasets from different labs, the MP sample 
from that lab is used as the control to process the 
related dataset, and only the fold change values of each 
gene from each dataset is kept for the future calculation. 

The insufficiency of the availability and comparability 
of A. thaliana pollen development expression data limit 
the power of NCA. To overcome the limitation, besides 
we take the mature pollen expression data as the control 
from the same experiment, we also collect the pollen 
development-related transcription factors from the 
Database of Arabidopsis Transcription Factors (DATF), 
The Arabidopsis Gene Regulatory Information Server 
(AGRIS), and the Plant Transcription Factor Database 
(PlnTFDB). 

In NCA, the pre-defined regulatory network initially 
accounts for the gene expression response. The regula- 
tory relationships between the transcription factors and 
their target genes can be collected from published litera- 
tures and transcriptional factors related databases [26]. 
From the three databases mentioned above, we collect 
2, 283 transcription factors which can be mapped to 
microarray probes. We also collect 8 interaction pairs 
between transcription factors specific for A. thaliana 
pollen development through text-mining. However, the 
interaction data between transcription factors and their 
target genes in pollen development is very limited. 
Therefore, we have not enough prior interactions avail- 
able for NCA. To overcome this limitation, we use the 
microarray data to explore the potential regulatory 
interactions according to the correlation coefficient (r) 
of each pair of transcription factors and the fold change 
(FC) of each gene under different conditions. We choose 
those gene pairs with correlation coefficient |r|>0.9 and 
the genes with |FC|>1.6. To reduce false positive data, 
all differentially expressed genes (DEGs) are hierarchi- 
cally clustered by FC values, and those genes with high 
correlation are grouped into corresponding clusters. The 
resulting clusters indicate that all the genes under a 
cluster can be regulated by the related TF. Taking the 
correlation coefficient as control strength for NCA, we 
define a matrix of regulatory relationships between the 
selected TFs and their target genes, and generate a 



regulatory network for the pollen development. The reg- 
ulatory network includes 289 transcription factors, 5530 
target genes and 429, 790 regulatory relations. Processed 
by NCA, we obtain 15 TFs and 101 target genes. 
Because of the inability of NCA to predict the regulatory 
pattern of transcription factors, we take the positive cor- 
relation between TF and its target genes as positive reg- 
ulation, and negative correlation as negative regulatory 
relation. Based on the network and the expression data, 
we further estimate the activities of the transcription 
factors in the network over pollen development with 
NCA and characterize the dynamic regulatory network. 
NCA decomposes the matrix of gene expression values 
into two matrixes, one matrix represents the influence 
of a transcription factor on a target gene and another 
reflects the activities of transcription factor [9]. 

Transcription factor activities under different pollen 
development stages 

The activities of 15 TFs clearly show stage-specific 
actions in pollen and pollen tube development. 12 of 
them (AT4G17490, AT5G43990, AT5G05410, 
AT5G04760, AT3G49530, AT5G03510, AT3G63360, 
AT4G26440, AT3G20670, AT3G24500, AT1G01720, 
AT1G52520) are activated during pollen development, 
while the genes for the rest 3 TFs (AT3G63350, 
AT4G00130, AT3G04100) remain relatively high expres- 
sion without significant change (Figure 1). AT4G 17490 
(ATERF6) gene, encoding the ethylene responsive ele- 
ment binding factor 6 [22], belongs to AP2-EREBP gene 
family and shows its maximum activity at 0.5 hr with a 
slight decrease at 4 hr in pollen tube development. Pre- 
vious research has indicated that members in AP2- 
EREBP gene family play a role in floral organ identity 
determination [27]. AT5G43990 (SUVR2) gene, its pro- 
duct acting as a histone-lysine N-methyltransferase/zinc 
ion binding factor [22], is expressed during the fourth 
anthesis [28], reaching its peak expression at TCP stage 
and returning to baseline at 4 hr stage during pollen 
development. SUVR2 is one of SUVR family protein, 
which can act in concert to achieve various functional 
H3K9 methylation states that will eventually lead to 
DNA methylation in a locus-specific manner (Mutskov 
and Felsenfeld 2004). The up-regulation of SUVR gene 
in the specific stage of pollen development indicates the 
involvement of histone remodification in the gene 
expression switch and regulation rewiring at the epige- 
netic level during the process. Gene AT5G05410 
(DREB2A) is expressed in pollen tube cell, and its activ- 
ity steadily increased from BCP to HP. DREB2A is an 
important transcription factor that has been confirmed 
to involve in heat or water stress-inducible gene expres- 
sion of A. thaliana. It specifically interacts with cis-act- 
ing dehydration-responsive element/C-repeat (DRE/ 
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Figure 1 Transcription factor activities calculated using NCA. (A) Predicted activities of the fifteen transcription factors used in this study. For 
each transcription factor, rows represent development stage. Activities of each row are normalized to the MP stage. (B) Transcription factor 
activities (red) compared to gene expression (blue), with Pearson correlation coefficients noted. Both activity and expression at each time point 
are normalized to the MP stage values, and the activity is further scaled for direct comparison with the expression values. (C) Correlation matrix 
between transcription factor activities. Red represents positive correlation, and blue represents negative correlation. (D) Inferred combinatorial 
regulation pairs of transcription factors. Red lines represent positive regulation, and blue lines represent negative regulation. Green square 
represents TFs associated with pollen development found by text-mining (The regulation of these TFs are putative). 

V ) 



CRT), thus functions in cold and drought stress-respon- 
sive gene expression in A. thaliana [29] . The expression 
pattern of DREB2A gene indicates that some cold and 
drought stress related biological processes are also 
involved in the pollen tube cell development and 
growth. AT5G04760 (MUK11.7) expression is detected 
in germinated pollen grain and pollen tube cell, and 
exhibits a sharp increase from MP to HP stage. 
AT5G03510 (F12E4.290), a C2H2-type zinc finger family 
protein, changes its gene expression from HP stage. As 
a member of heat stress transcription factor family, 
AT3G63350 (HSFA7B) has been shown to be expressed 
during the fourth anthesis stage [28], and down-regu- 
lated at BCP, HP stage and eventually return to its base 
level. AT3G62260 gene (T17J13.220, encoding a protein 
phosphatase 2c family protein), which expression has 



been reported during the fourth anthesis stage as 
AT3G63350 does [28], is turned on at TCP stage. 
AT3G49530 (NTL6), auto-stimulated in pollen tube cell 
development [30], is up-regulated at HP stage. 
AT4G26440 (ATWRKY34, a member of WRKY tran- 
scription factor family), which gene expression has been 
detected in anther and pollen tube cell [28], is activated 
at BCP. Its gene expression has been confirmed as pol- 
len specific [31-33]. AT4G00130 (F6N15.6) gene pre- 
sents a rapidly reduced activity from BCP to HP and a 
sharp increase from HP to 0.5 hr stage. AT3G20670 
(HTA13) gene, which is expressed in pollen tube cell, 
increases its expression steadily from UNM to HP stage. 
AT3G24500 (MBF1C) is a key regulator of a coordi- 
nated heat stress-response network involving SA-, treha- 
lose- and ethylene-signaling pathways, and its gene is 
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expressed in pollen tube; its expression is steadily 
increased from BCP and reaches its peak at HP stage. 
AT3G04100 (ATAF1) belongs to a large family of puta- 
tive transcriptional activators with NAC domain; its 
expression is detected in pollen tube cell and deacti- 
vated from BCP stage. As the same family as 
AT3G04100 with NAC domain, AT1G01720 (ATAF1) 
gene also shows its expression in pollen tube cell, it is 
steadily up-regulated from BCM and reaches its peak 
expression at HP stage. ATAF1 has been proposed to 
modulate plant ABA signaling and high ATAF1 expres- 
sion has been considered to contribute to ABA hyper- 
sensitivity in Arabidopsis [34]. AT1G52520 (FRS6), 
which potentially acts as positive regulators in phyB sig- 
naling pathway controlling flowering time [35], is stea- 
dily up-regulated from UNM and reaches its peak 
expression at 0.5 hr stage. 

The correlations between gene expressions for tran- 
scription factors and their activities are not identical 
among all the transcription factors. Five transcription 
factors (AT4G17490, AT5G03510, AT3G62260, 
AT1G52520, AT3G20670) present strong positive corre- 
lation between their activities and expressions (r > 0.5), 
when three transcription factors (AT5G43990, 
AT5G04760, AT4G26440) show strong negative correla- 
tion (r < -0.5). However, the rest seven TFs 
(AT3G63350, AT5G05410, AT3G49530, AT3G24500, 
AT3G04100, AT1G01720, and AT4G00130) display less 
consistence or no correlation at all (|r|< 0.5). 

Since the linear model of gene expression upon which 
NCA rests does not reveal the relationships between 
transcription factors, we search all the transcription fac- 
tor pairs with high correlation (|r|> 0.5) from the pro- 
tein-protein interactions catalogued in the A. thaliana 
Protein Interactome Database [36]. However, no protein- 
protein interaction has been recorded for any pair of the 
15 TFs. Although no experimental data confirms the 
direct interactions between those TFs, the high correla- 
tions between some TFs under different development 
states suggest their possible relations. Interestingly, the 
correlation matrix between transcription factor activities 
reveals that two sets of TFs' activities are apparently posi- 
tively correlated. One set includes 6 TFs: HSFA7B, 
AT3G62260, FRS6, ERF6, AT4G00130, and AGL57, 
another includes WRKY34, AT3G04760, SUVR2. 
Although no experimental data supports that the TFs in 
each set form direct interaction, the results inferred from 
NCA represent an indirect evidence of the interaction or 
cooperation among them. 

Regulatory influence matrix and gene expression 
clustering 

According to the assumptions of NCA, the target gene 
expression is controlled by an adjusted strength matrix 



and the transcription factor activities. The assigned 
quantitative values of the adjusted strength are able to 
be used to obtain more biologically meaningful clusters 
than by using target genes' expression. Based on their 
expressions, the target genes are hierarchically clustered 
with the adjusted strengths of transcription factors (Fig- 
ure 2A). In total, eleven major clusters are identified 
(Additional file 1), which represents the coordinated 
actions of transcription factors to regulate the gene 
expression. Cluster 4, 7, 8, and 9 highlight the influence 
of single TF on a set of genes, whereas cluster 3, 11, 10, 
and 5 display a set of TFs influence on a set of genes. 
Interestingly, the regulatory relationships from the clus- 
ters can also disclose the auto-regulation of the tran- 
scription factors. For example, in the cluster 4, it reveals 
that the gene AT3G04100 (AGL57), which encodes a 
MADS-box family protein, is also a target of its own 
protein, and the same as AT1G01720 in cluster 8, 
AT1G01720 in cluster 9 and AT4G00130 in cluster 12, 
as well as AT5G43990, AT5G04760, AT3G63350, 
AT3G49530 in cluster 3. Those self-regulations are 
unable to be identified from the coexpression approach. 
NCA shows certain advantages and the auto-regulation 
can be inferred from clustering on the matrix of regula- 
tion influence. 

On the other hand, clustering by regulatory strength 
can identify new clusters unobtainable by clustering the 
expression data alone. For example, cluster 9 and 5 
could not be distinguished when clustering is applied to 
the gene expression data alone (Figure 2B). In contrast, 
those two groups can be separated with clustering on 
the regulatory strength matrix, and are linked to the 
regulatory influence of transcription factor DREB2A, 
HTA13 and NTL6. For the target genes FZR2 and 
SVR1, they cannot be grouped together with the cluster- 
ing method on the gene expression data alone (Figure 
2B), but they are grouped into cluster 3 based on regu- 
latory strength and supposedly regulated by transcrip- 
tion factors SUVR2, AT5G04760, HSFA7B, AT3G62260, 
NTL6, HTA13, MBF1C, and FRS6. Furthermore, the 
clustering of the NCA-processed strength matrix 
adjusted from the initial connectivity matrix can group 
genes with different expression patterns (Figure 2A and 
2C). 

Our results further demonstrate that the estimated 
transcriptional regulation strengths have certain advan- 
tages over the gene coexpression approaches for explor- 
ing the regulatory relationships and can provide a new 
insight to the regulatory relations of between transcrip- 
tion factors and their target genes. 

Coexpression analysis of the regulatory gene sets 

Each pair of TF and its target gene(s) classified by NCA 
have a high correlation coefficient (|p|>0.9) based on 
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Figure 2 Hierarchical clustering in the context of a defined regulatory network. (A) The adjusted strength matrix is used for clustering, 
with gene expression matrix appended. In the adjusted strength matrix heatmap, red color indicates positive regulatory influence, while blue 
color indicates negative regulatory influence. (B) Clustering with gene expression only. (C) Clustering with the binary regulatory relations (initial 
connectivity matrix), assuming the absolute values of all regulatory strengths are equal. 



gene expression. Considering that our identified regula- 
tory relationships between each TF and its target genes 
are derived only from process of pollen development, 
we further test the robustness of the coexpression under 
other conditions, such as tissue, abiotic and light 



conditions. We explore each pair of the TFs and its tar- 
get gene(s) inferred from NCA in ATTED [37] which is 
a database of gene coexpression in Arabidopsis under a 
wide variety of experiment conditions, and find 65 coex- 
pression pairs (in total 472 identified pairs) with 
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correlation coefficient larger than 0.4 (|r|> 0.4), includ- 
ing 8 TFs and 35 target genes. Almost a quarter (15/65) 
of these coexpressions are negative. Since the rest 407 
TF and target gene pairs display the low correlation 
under all other experimental conditions but show a high 
correlation in pollen development process, it is reason- 
able to state that those pairs could be specific in pollen 
development. There are 5 clusters with more than one 
TF in each cluster. We search the coexpression for 
those TFs in each cluster, and find 9 pairs of TFs to 
present the relatively significant coexpression (in total 
15 TFs; |r| >0.4). Almost all pairs of those coexpressed 
TFs are positively correlated, except one pair between 
At5g04760 and At5g43990 in cluster 3 (r = -0.41). In 
addition, we also search every pair of target genes in 
each cluster for the coexpression, and find 118 coex- 
pression pairs with 6 highly correlated ones (r>0.8), 
which implies that the rest 112 pairs of coexpression 
genes in each cluster could be specific in the related 
stage of pollen development process. 

The regulatory dynamics of pollen development 

According to the relationships inferred from NCA, we 
built an integrated model of A. thaliana pollen develop- 
ment (Figure 3). The final dynamic network integrates 
the inferred transcription factor activities, the regulatory 
relationships between TFs and their target genes, clus- 
tering on the adjusted strengths, the gene expression 
profiles, and the text-mining data. The network includes 
19 TFs and 101 target genes. Several transcription fac- 
tors present their specific dynamic expression pattern 



during the pollen development. For example, the expres- 
sion of AT5G04760 is not detectable during UNM 
development stage, while AGL18, OFP1, TSOl and 
MYB65 are not expressed during TCP, HP, 0.5 hr and 4 
hr development stages. The rest genes present their 
expression during all of the pollen development pro- 
cesses and display different expression at least ones. 

AT5G04760 is found no expression at UNM stage. 
From UNM to BCP stage, AT5G04760 is activated and 
interacts with SUVR2 to regulate their downstream 
gene expression. In contrast, AGL57 is deactivated 
during the stage switch. By the end of BCP stage, 
AT5G04760 and AGL57 have already executed their 
function and affected gene expression, including the 
genes in clusters 3, 4, 5, 10, and 11. From BCP to TCP 
stage, all genes show trends of not differently 
expressed. The pollen in TCP stage is similar to MP 
stage since the number of DEGs detected in both 
stages is very small. For transcription factors AGL18, 
OFP1, TSOl, and MYB65, they are curated to play the 
roles in pollen development from literature and there- 
fore incorporated in the regulatory network. Those 
transcription factors show no detectable expression 
until into the 4 hr stage. Another transcription factor, 
DREB2A, is dramatically deactivated from the begin- 
ning. After TCP stage, DREB2A keeps steadily acti- 
vated; until HP stage, it begins to restore to their basal 
level of activity. The temporal model therefore pro- 
vides a global view of TFs' activation and the regula- 
tory relationships between TFs and their target genes 
during the pollen development of A. thaliana. 
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Figure 3 A dynamic network of transcription during A. thaliana pollen development. The pollen development of A. thaliana ranges from 
UNM, BCP, TCP, MP, HP, to 0.5 hr pollen tube, and 4 hour pollen tube stages. The transcription factors are represented as a square, and target 
genes as a circle. Blue or red arrow lines show the influence of a transcription factor on a target gene, positively or negatively. The transcription 
factors, that are not processed by NCA but collected by text-mining, include AGL18, OFP1, TS01 and MYB65. The genes with no expression are 
denoted with green line. 1 1 clusters are grouped together in total. 
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The transcription networks have been proven to be 
made up of a small set of recurring regulation patterns 
that are called network motifs, and they serve as basic 
building blocks of transcription networks. To obtain the 
regulation pattern during pollen development, we detect 
network motifs in the network. In total, we retrieve 11 
network motifs for motif size 3, 82 motifs with motif 
size 4, and 778 motifs with motif size 5. Each motif 
embodies a regulation pattern. And most all of the TFs 
display different roles in more than one regulation pat- 
tern. We detect the network motifs for all pollen devel- 
opment stage and find some interesting TF interactions 
(Figure 4). 

For example, MBF1C, which expresses in pollen tube 
and enhances the tolerance to various biotic and abiotic 



stresses [38], displays the pattern of up-regulates 
AT3G62260 and down-regulates NTL6. AT3G62260 
functions as protein serine/threonine phosphatase activ- 
ity and NTL6 undergoes proteolytic processing. Our 
result indicates that MBF1C regulates protein serine/ 
threonine phosphorylation and proteolysis in the oppo- 
site direction. Since phosphorylation plays an important 
role in the pollen-stigma interaction [39] and 
AT3G62260 is upregulated before TCP stage, it can be 
anticipated that MBF1C promotes the pollen-stigma 
recognition. 

According to the network motif, WRKY34 upregulates 
other 3 target genes: FER3, RHD2 and GRP4 in the pol- 
len development. FER3 has been reported to protect 
cells against oxidative damage [40] and RHD2 can lead 
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the formation of reactive oxygen species [41], whereas 
overexpression of GRP4 can increase plant tolerance to 
osmotic stress [42]. 

Therefore, as a gene solely expressed in pollen, 
WRKY34 potentially promotes the expression of FER3, 
RHD2 and GRP4, which may function as a module to 
balance the reactive oxygen species metabolism during 
the process. 

Conclusions 

The ultimate goal of our work is to construct a dynamic 
regulation network of pollen development. With NCA, 
we have predicted the activities of 15 transcription fac- 
tors and the regulatory strengths of those TFs to their 
target genes. Based on the regulatory strength matrix, 
we have clustered the coexpressed and coregulated 
genes into different groups. By incorporating the regula- 
tory network information with the regulatory strength 
matrix, we have further inferred the activities and inter- 
actions between transcription factors and their target 
genes. 

The regulatory strength matrix is clustered to deter- 
mine gene groups which are not only co-expressed, but 
also co-regulated. Identification of interactions between 
TFs and their target genes enable us to interpret the 
activation of regulatory relationship over development 
stage. Beyond the 15 TFs, we have also identified addi- 
tional 4 TFs and explored the special expression pattern 
of the 4 TFs that are not included in the model, but are 
pollen development-related by text-mining. Moreover, 
WRKY34, which has been reported only expressed in 
pollen [43], has also been identified by NCA. We finally 
have reconstructed the dynamics of pollen development 
process of A. thaliana using above results. Moreover, 
we present the dynamic regulatory networks over all 
explored pollen development stages. 

Although the NCA we used in this work can infer 
hidden TF activities by taking advantages of the prior of 
network structure, most of the regulatory information 
however is not available and the regulatory pairs 
retrieved from coexpression tend to be hypothetical. In 
addition, NCA is based on a phenomenal model of TFs' 
regulatory over target genes, which correlates with Hill 
cooperation between TFs, which do not potentially 
reflect the biological reality if we consider the complex- 
ity and multi-steps of the transcription event [44]. 
Nevertheless, in this study we combine all available 
datasets and construct a comprehensive dynamic net- 
work of the A. thaliana pollen development. This net- 
work characterizes the stage-specific activities of TFs of 
importance and the corresponding dynamics of this net- 
work during the stage of development. New relations 
between transcription factors and their target genes 
have been inferred from the network. Obviously, this 



network will shed new light on the study of mechanisms 
that governing the development of the pollen. 

Methods 

Data preprocessing 

The gene expression datasets were obtained from Gene 
Expression Omnibus (GEO), with accession numbers: 
GSE6162, GSE6696, and GSE17343. The log2 ratio of 
genes expression in each development stage was calcu- 
lated by MAS5 [45], with significance as p-value < 0.01. 
For all development stages we explored, the genes with 
at least differentially expressed at one stage were 
selected. In total, 5, 980 genes, which were differentially 
expressed (|FC|>1.6), were selected to be hierarchically 
clustered by hcluster of R language and to calculate the 
correlation coefficient for each pair of genes. For each 
pair of TF and its target gene, only the target gene in 
the sub-tree of the TF-node with the coefficient larger 
than 0.9 was kept for NCA. 

Network component analysis and dynamic network 
construction 

Network component analysis (NCA) is a powerful math- 
ematical tool for uncovering hidden regulatory signals 
from gene expression levels with a prior network struc- 
ture information in terms of matrix decomposition [46]. 
The classical decomposition methods, such as PCA and 
ICA, assume orthogonality and independence, respec- 
tively, all of which lack biological foundation. On the 
other hand, the NCA does not make any assumptions 
on statistical properties and allows proper handling of 
prior network connectivity information. 

NCA uses the standard log-linear model to approxi- 
mate the relationship between levels of TFs activity and 
that of the target-gene expression by assuming the Hill 
cooperation between TFs on the promoter region of tar- 
get genes. Formally: 

m rr \ CSi ' m 

Where t represents the time stage, E t {t) is the gene 
expression level and TFAj(t) is TF j's activities and cs t j 
reflects the control strength of TF j on gene i. 

After logarithm, the equation (1) is linearized into (in 
forms of matrix): 

log[£r] = [CS]log[TFAr] (2) 

While the matrix [Er] consists of elements [Er] t j = Ey 
(t)/Eij(0) and similarly [TFArhj = TEA t j(t)/ TEA ^ ( 0 ) , 
represents the relative gene expression levels and TFs' 
activities. The dimension of [Er] is N x M (N genes and 
M samples or conditions) while that of [TFAr] is L x M 
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(L TFs). They respectively indicate the time courses of 
relative gene expression levels and TFs' activities. 
Finally, size of [CS] is N x L, which is the control 
strength for L TFs on each of N genes. The equation (2) 
above can be further simplified as: 

[E] = [S][A] (3) 

Here, we have the strength matrix, [S], which corre- 
sponding to the term of [CS] in equation (2) and the 
TFs' activity matrix [A], which is the equivalent of log 
[TFAr] in the equation (2), and finally, the gene expres- 
sion matrix of [E] corresponds to the term of log[£r] in 
equation (2). 

Based on above preparation, the decomposition of [E] 
into [S] and [A] can be achieved by minimizing the fol- 
lowing object function: 

min||([E]-[S][A])|| 
Subject to. S £ Zq 

In NCA, the above target function is estimated by 
using the bootstrap algorithm and the value of [S] and 
[A] can be normalized through a nonsingular matrix of 
[X] according to, 

[B] = [S][A] = [S][X][X- 1 ][A] (5) 

Specifically, to guarantee uniqueness of the solution 
for the matrix decomposition of Eq. 4, the network 
topology needs to satisfy some criteria [9]: (i) The con- 
nectivity matrix [A] must have full-column rank, (ii) 
When a node in the regulatory layer is removed along 
with all of the output nodes connected to it, the result- 
ing network must be characterized by a connectivity 
matrix that still has full-column rank, (iii) [P] must have 
full row rank. 

The algorithm of NCA is already implemented in 
MATLAB by the authors, which is downloadable at 
http://www.seas.ucla.edu/~liaoj/. In this study, we fol- 
lowed the manual of this package and performed our 
computation. 

With NCA, the significant TFs and their target genes 
were detected, the control strength of TFs to their target 
genes was recalculated, and the activities of the TFs 
were estimated. We took the control strength (only as 
positive or negative) as the regulatory relationships 
between TFs and their target genes (including TFs), and 
the TFs activities substitute for their gene expression to 
construct the dynamic network. 

Over-presented motifs among network 

Motifs are small connected sub-networks that a network 
displays in significantly higher frequencies than would be 
expected for a random network. To uncover the regulation 



pattern of dynamic regulation network, we took FAN- 
MOD [47,48] to detect the over-presented motifs. 

Additional material 



Additional file 1: Major clusters formed from the adjusted strength 
matrix and the target genes' GO functions. Cluster of the genes. 



Acknowledgements 

We would like to thank Peng Li and Feng Xu, who helped to fix the bugs 
when programming. This work was supported by the National 973 Key Basic 
Research Program (Grant Nos. 2007CB1 08800 and 2010CB945401), and the 
National Natural Science Foundation of China (Grant No. 30870575, 
30730078, 31000590) and the Science and Technology Commission of 
Shanghai Municipality (1 1 DZ2260300). 

This article has been published as part of BMC Systems Biology Volume 5 
Supplement 3, 2011: BIOCOMP 2010 - The 2010 International Conference on 
Bioinformatics & Computational Biology: Systems Biology. The full contents 
of the supplement are available online at http://www.biomedcentral.com/ 
1752-0509/5?issue=S3. 

Author details 

College of Life Sciences, Northeast Forestry University, Heilongjiang, Harbin 
150040, China. 2 The Center for Bioinformatics and The Institute of 
Biomedical Sciences, School of Life Sciences, East China Normal University, 
Shanghai 200241, China. 3 Wuhan University of Science and Technology, 
Wuhan, Hubei 430081, P.R. China, department of Internal Medicine, Rush 
University Medical Center, Chicago, Illinois 60612, USA. 5 Shanghai 
Information Center for Life Sciences, Shanghai Institutes for Biological 
Sciences, Chinese Academy of Science, Shanghai 200031, China. 

Authors' contributions 

Data collection: JGW. Programming: JGW. Design of the analysis process: TLS, 
JGW. Data analysis: JGW, TLS, YD, XJQ. Paper Writing: JGW, XJQ, YD, TLS and 
YHL Paper Finalizing: TLS and YHL All authors read and approved the final 
manuscript. 

Competing interests 

The authors declare that they have no competing interests. 
Published: 23 December 2011 

References 

1. Schena M, Shalon D, Davis RW, Brown PO: Quantitative monitoring of 
gene expression patterns with a complementary DNA microarray. 
Science 1995, 270:467-470. 

2. Ren B, Robert F, Wyrick JJ, Aparicio 0, Jennings EG, Simon I, Zeitlinger J, 
Schreiber J, Hannett N, Kanin E, et al: Genome-wide location and function 
of DNA binding proteins. Science 2000, 290:2306-2309. 

3. Kim TH, Ren B: Genome-wide analysis of protein-DNA interactions. Annu 
Rev Genomics Hum Genet 2006, 7:81-102. 

4. Tong Y, Falk J: Genome-wide analysis for protein-DNA interaction: ChlP- 
chip. Methods Mol Biol 2009, 590:235-251. 

5. Alter 0, Brown PO, Botstein D: Singular value decomposition for genome- 
wide expression data processing and modeling. Proc Natl Acad Sci USA 
2000, 97:10101-10106. 

6. Herrgard MJ, Covert MW, Palsson BO: Reconstruction of microbial 
transcriptional regulatory networks. Curr Opin Biotechnol 2004, 15:70-77. 

7. Kalir S, Alon U: Using a quantitative blueprint to reprogram the dynamics 
of the flagella gene network. Cell 2004, 117:713-720. 

8. Wang X, Wang H, Xie J: Genes and regulatory networks involved in 
persistence of Mycobacterium tuberculosis. Sci China Life Sci 201 1, 
54:300-310. 

9. Liao JC, Boscolo R, Yang YL, Tran LM, Sabatti C, Roychowdhury VP: Network 
component analysis: reconstruction of regulatory signals in biological 
systems. Proc Natl Acad Sci USA 2003, 100:15522-15527. 



Wang et al. BMC Systems Biology 201 1, 5(Suppl 3):S8 
http://www.biomedcentral.eom/1752-0509/5/S3/S8 



Page 11 of 1 1 



10. Chen SF, Juang YL, Chou WK, Lai JM, Huang CY, Kao CY, Wang FS: Inferring 
a transcriptional regulatory network of the cytokinesis-related genes by 
network component analysis. BMC Syst Biol 2009, 3:1 10. 

11. Kao KC, Yang YL, Boscolo R, Sabatti C, Roychowdhury V, Liao JC: 
Transcriptome-based determination of multiple transcription regulator 
activities in Escherichia coli by using network component analysis. Proc 
Natl Acad Sci USA 2004, 101:641-646. 

12. MacLennan NK, Rahib L, Shin C, Fang Z, Horvath S, Dean J, Liao JC, 
McCabe ER, Dipple KM: Targeted disruption of glycerol kinase gene in 
mice: expression analysis in liver shows alterations in network partners 
related to glycerol kinase activity. Hum Mol Genet 2006, 15:405-415. 

13. Seok J, Xiao W, Moldawer LL, Davis RW, Covert MW: A dynamic network of 
transcription in LPS-treated human subjects. BMC Syst Biol 2009, 3:78. 

14. Honys D, Twell D: Transcriptome analysis of haploid male gametophyte 
development in Arabidopsis. Genome Biol 2004, 5:R85. 

15. Wang Y, Zhang WZ, Song LF, Zou JJ, Su Z, Wu WH: Transcriptome 
analyses show changes in gene expression to accompany pollen 
germination and tube growth in arabidopsis. Plant Physiol 2008, 
148:1201-1211. 

16. Qin Y, Leydon AR, Manziello A, Pandey R, Mount D, Denic S, Vasic B, 
Johnson MA, Palanivelu R: Penetration of the stigma and style elicits a 
novel transcriptome in pollen tubes, pointing to genes critical for 
growth in a pistil. PLoS Genet 2009, 5. 

17. Miyazaki S, Murata T, Sakurai-Ozato N, Kubo M, Demura T, Fukuda H, 
Hasebe M: ANXUR1 and 2, sister genes to FERONIA/SIRENE, are male 
factors for coordinated fertilization. Curr Biol 2009, 19:1327-1331. 

18. Xin H, Sun M: What we have learned from transcript profile analyses of 
male and female gametes in flowering plants. Sci China Life Sci 2010, 
53:927-933. 

19. lida K, Seki M, Sakurai T, Satou M, Akiyama K, Toyoda T, Konagaya A, 
Shinozaki K: RARTF: database and tools for complete sets of Arabidopsis 
transcription factors. DNA Res 2005, 12:247-256. 

20. Davuluri RV, Sun H, Palaniswamy SK, Matthews N, Molina C, Kurtz M, 
Grotewold E: AGRIS: Arabidopsis gene regulatory information server, an 
information resource of Arabidopsis cis-regulatory elements and 
transcription factors. BMC Bioinformatics 2003, 4. 

21. Guo AY, He K, Liu D, Bai SN, Gu XC, Wei LP, Luo JC: DATF: a database of 
Arabidopsis transcription factors. Bioinformatics 2005, 21:2568-2569. 

22. Perez-Rodriguez P, Riano-Pachon DM, Correa LG, Rensing SA, Kersten B, 
Mueller-Roeber B: PlnTFDB: updated content and new features of the 
plant transcription factor database. Nucleic Acids Res 2010, 38:D822-827. 

23. Pascal L, True L, Campbell D, Deutsch E, Risk M, Coleman I, Eichner L, 
Nelson P, Liu A: Correlation of mRNA and protein levels: cell type-specific 
gene expression of cluster designation antigens in the prostate. BMC 
Genomics 2008, 9:246. 

24. Guo Y, Xiao P, Lei S, Deng F, Xiao GG, Liu Y, Chen X, Li L, Wu S, Chen Y: 
How is mRNA expression predictive for protein expression? A correlation 
study on human circulating monocytes. Acta Biochim Biophys Sin 
(Shanghai) 2008, 40:426-436. 

25. Tian Q, Stepaniants SB, Mao M, Weng L, Feetham MC, Doyle MJ, Yi EC, 
Dai H, Thorsson V, Eng J, et al: Integrated genomic and proteomic 
analyses of gene expression in mammalian cells. Mol Cell Proteomics 
2004, 3:960-969. 

26. Bulow L, Brill Y, Hehl R: AthaMap-assisted transcription factor target gene 
identification in Arabidopsis thaliana. Database (Oxford) 2010, 2010: 
baq034. 

27. Riechmann JL, Meyerowitz EM: The AP2/EREBP family of plant 
transcription factors. Biol Chem 1998, 379:633-646. 

28. Swarbreck D, Wilks C, Lamesch P, Berardini TZ, Garcia-Hernandez M, 
Foerster H, Li D, Meyer T, Muller R, Ploetz L, et al: The Arabidopsis 
Information Resource (TAIR): gene structure and function annotation. 
Nucleic Acids Res 2008, 36:D1009-1 014. 

29. Sakuma Y, Maruyama K, Osakabe Y, Qin F, Seki M, Shinozaki K, Yamaguchi- 
Shinozaki K: Functional analysis of an Arabidopsis transcription factor, 
DREB2A, involved in drought-responsive gene expression. Plant Cell 2006, 
18:1292-1309. 

30. Seo PJ, Kim MJ, Park JY, Kim SY, Jeon J, Lee YH, Kim J, Park CM: Cold 
activation of a plasma membrane-tethered NAC transcription factor 
induces a pathogen resistance response in Arabidopsis. Plant J 2010, 
61:661-671. 



31. Eulgem T, Rushton PJ, Robatzek S, Somssich IE: The WRKY superfamily of 
plant transcription factors. Trends Plant Sci 2000, 5:199-206. 

32. Zou C, Jiang W, Yu D: Male gametophyte-specific WRKY34 transcription 
factor mediates cold sensitivity of mature pollen in Arabidopsis. J Exp 
Bot 2010, 61:3901-3914. 

33. Honys D, Oh SA, Renak D, Donders M, Solcova B, Johnson JA, Boudova R, 
Twell D: Identification of microspore-active promoters that allow 
targeted manipulation of gene expression at early stages of 
microgametogenesis in Arabidopsis. BMC Plant Biol 2006, 6:-. 

34. Wu YR, Deng ZY, Lai JB, Zhang YY, Yang CP, Yin BJ, Zhao QZ, Zhang L, Li Y, 
Yang CW, Xie Q: Dual function of Arabidopsis ATAF1 in abiotic and biotic 
stress responses. Cell Res 2009, 19:1279-1290. 

35. Lin RC, Wang HY: Arabidopsis FHY3/FAR1 gene family and distinct roles 
of its members in light control of arabidopsis development. Plant Physiol 
2004, 136:4010-4022. 

36. Li P, Zang WD, Li YH, Xu F, Wang JG, Shi TL: AtPID: the overall hierarchical 
functional protein interaction network interface and analytic platform 
for Arabidopsis. Nucleic Acids Res 201 1, 39:D1 130-D1 133. 

37. Obayashi T, Kinoshita K, Nakai K, Shibaoka M, Hayashi S, Saeki M, Shibata D, 
Saito K, Ohta H: ATTED-II: a database of co-expressed genes and cis 
elements for identifying co-regulated gene groups in Arabidopsis. 
Nucleic Acids Res 2007, 35:D863-869. 

38. Suzuki N, Bajad S, Shuman J, Shulaev V, Mittler R: The transcriptional co- 
activator MBF1c is a key regulator of thermotolerance in Arabidopsis 
thaliana. J Biol Chem 2008, 283:9269-9275. 

39. Hiscock SJ, Doughty J, Dickinson HG: Synthesis and phosphorylation of 
pollen proteins during the pollen-stigma interaction in self-compatible 
Brassica napus L. and self-incompatible Brassica oleracea L. Sexual Plant 
Reproduction 1995, 8:345-353. 

40. Petit JM, Briat JF, Lobreaux S: Structure and differential expression of the 
four members of the Arabidopsis thaliana ferritin gene family. Biochem J 

2001, 359:575-582. 

41. Foreman J, Demidchik V, Bothwell JH, Mylona P, Miedema H, Torres MA, 
Linstead P, Costa S, Brownlee C, Jones JD, et al: Reactive oxygen species 
produced by NADPH oxidase regulate plant cell growth. Nature 2003, 
422:442-446. 

42. Kwak KJ, Kim YO, Kang H: Characterization of transgenic Arabidopsis 
plants overexpressing GR-RBP4 under high salinity, dehydration, or cold 
stress. J Exp Bot 2005, 56:3007-3016. 

43. Zou C, Jiang W, Yu D: Male gametophyte-specific WRKY34 transcription 
factor mediates cold sensitivity of mature pollen in Arabidopsis. J Exp 
Bot 2010, 61:3901. 

44. Andrecut M, Halley JD, Winkler DA, Huang S: A general model for binary 
cell fate decision gene circuits with degeneracy: indeterminacy and 
switch behavior in the absence of cooperativity. PLoS One 201 1, 6:e1 9358. 

45. Liu WM, Mei R, Di X, Ryder TB, Hubbell E, Dee S, Webster TA, 
Harrington CA, Ho MH, Baid J, Smeekens SP: Analysis of high density 
expression microarrays with signed-rank call algorithms. Bioinformatics 

2002, 18:1593-1599. 

46. Liao JC, Boscolo R, Yang YL, Tran LM, Sabatti C, Roychowdhury VP: Network 
component analysis: reconstruction of regulatory signals in biological 
systems. Proc Natl Acad Sci USA 2003, 100:15522-15527. 

47. Wernicke S, Rasche F: FANMOD: a tool for fast network motif detection. 

Bioinformatics 2006, 22:1 1 52-1 1 53. 

48. Wernicke S: Efficient detection of network motifs. IEEE/ACM Trans Comput 
Biol Bioinform 2006, 3:347-359. 



doi:1 0.1 1 86/1 752-0509-5-S3-S8 

Cite this article as: Wang et al.: A transcriptional dynamic network 
during Arabidopsis thaliana pollen development. BMC Systems Biology 
201 1 5(Suppl 3):S8. 



